对于使用高性能机器学习算法通常不透明的决策,人们越来越担心。用特定于领域的术语对推理过程的解释对于在医疗保健等风险敏感领域中采用至关重要。我们认为,机器学习算法应该可以通过设计来解释,并且表达这些解释的语言应与域和任务有关。因此,我们将模型的预测基于数据的用户定义和特定于任务的二进制函数,每个都对最终用户有明确的解释。然后,我们最大程度地减少了在任何给定输入上准确预测所需的预期查询数。由于解决方案通常是棘手的,因此在事先工作之后,我们根据信息增益顺序选择查询。但是,与以前的工作相反,我们不必假设查询在有条件地独立。取而代之的是,我们利用随机生成模型(VAE)和MCMC算法(未经调整的Langevin)来选择基于先前的查询 - 答案的输入的最有用的查询。这使得在线确定要解决预测歧义所需的任何深度的查询链。最后,关于视觉和NLP任务的实验证明了我们的方法的功效及其优越性比事后解释的优势。
translated by 谷歌翻译
The ability to identify and temporally segment finegrained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal features from video frames and then feeding them into a temporal classifier that captures high-level temporal patterns. We introduce a new class of temporal models, which we call Temporal Convolutional Networks (TCNs), that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection. Our Encoder-Decoder TCN uses pooling and upsampling to efficiently capture long-range temporal patterns whereas our Dilated TCN uses dilated convolutions. We show that TCNs are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks. We apply these models to three challenging fine-grained datasets and show large improvements over the state of the art.
translated by 谷歌翻译
State-of-the-art object detectors are fast and accurate, but they require a large amount of well annotated training data to obtain good performance. However, obtaining a large amount of training annotations specific to a particular task, i.e., fine-grained annotations, is costly in practice. In contrast, obtaining common-sense relationships from text, e.g., "a table-lamp is a lamp that sits on top of a table", is much easier. Additionally, common-sense relationships like "on-top-of" are easy to annotate in a task-agnostic fashion. In this paper, we propose a probabilistic model that uses such relational knowledge to transform an off-the-shelf detector of coarse object categories (e.g., "table", "lamp") into a detector of fine-grained categories (e.g., "table-lamp"). We demonstrate that our method, RelDetect, achieves performance competitive to finetuning based state-of-the-art object detector baselines when an extremely low amount of fine-grained annotations is available ($0.2\%$ of entire dataset). We also demonstrate that RelDetect is able to utilize the inherent transferability of relationship information to obtain a better performance ($+5$ mAP points) than the above baselines on an unseen dataset (zero-shot transfer). In summary, we demonstrate the power of using relationships for object detection on datasets where fine-grained object categories can be linked to coarse-grained categories via suitable relationships.
translated by 谷歌翻译
A normalizing flow (NF) is a mapping that transforms a chosen probability distribution to a normal distribution. Such flows are a common technique used for data generation and density estimation in machine learning and data science. The density estimate obtained with a NF requires a change of variables formula that involves the computation of the Jacobian determinant of the NF transformation. In order to tractably compute this determinant, continuous normalizing flows (CNF) estimate the mapping and its Jacobian determinant using a neural ODE. Optimal transport (OT) theory has been successfully used to assist in finding CNFs by formulating them as OT problems with a soft penalty for enforcing the standard normal distribution as a target measure. A drawback of OT-based CNFs is the addition of a hyperparameter, $\alpha$, that controls the strength of the soft penalty and requires significant tuning. We present JKO-Flow, an algorithm to solve OT-based CNF without the need of tuning $\alpha$. This is achieved by integrating the OT CNF framework into a Wasserstein gradient flow framework, also known as the JKO scheme. Instead of tuning $\alpha$, we repeatedly solve the optimization problem for a fixed $\alpha$ effectively performing a JKO update with a time-step $\alpha$. Hence we obtain a "divide and conquer" algorithm by repeatedly solving simpler problems instead of solving a potentially harder problem with large $\alpha$.
translated by 谷歌翻译
Predictive monitoring is a subfield of process mining that aims to predict how a running case will unfold in the future. One of its main challenges is forecasting the sequence of activities that will occur from a given point in time -- suffix prediction -- . Most approaches to the suffix prediction problem learn to predict the suffix by learning how to predict the next activity only, not learning from the whole suffix during the training phase. This paper proposes a novel architecture based on an encoder-decoder model with an attention mechanism that decouples the representation learning of the prefixes from the inference phase, predicting only the activities of the suffix. During the inference phase, this architecture is extended with a heuristic search algorithm that improves the selection of the activity for each index of the suffix. Our approach has been tested using 12 public event logs against 6 different state-of-the-art proposals, showing that it significantly outperforms these proposals.
translated by 谷歌翻译
Tourette Syndrome (TS) is a behavior disorder that onsets in childhood and is characterized by the expression of involuntary movements and sounds commonly referred to as tics. Behavioral therapy is the first-line treatment for patients with TS, and it helps patients raise awareness about tic occurrence as well as develop tic inhibition strategies. However, the limited availability of therapists and the difficulties for in-home follow up work limits its effectiveness. An automatic tic detection system that is easy to deploy could alleviate the difficulties of home-therapy by providing feedback to the patients while exercising tic awareness. In this work, we propose a novel architecture (T-Net) for automatic tic detection and classification from untrimmed videos. T-Net combines temporal detection and segmentation and operates on features that are interpretable to a clinician. We compare T-Net to several state-of-the-art systems working on deep features extracted from the raw videos and T-Net achieves comparable performance in terms of average precision while relying on interpretable features needed in clinical practice.
translated by 谷歌翻译
网络物理系统(CPS)的复杂性日益增加,使工业自动化具有挑战性。需要处理大量传感器记录的数据,以充分执行诸如故障的诊断之类的任务。解决这种复杂性的一种有希望的方法是因果关系的概念。但是,大多数有关因果关系的研究都集中在推断未知系统部分之间的因果关系。工程以根本不同的方式使用因果关系:复杂的系统是通过将组件与已知可控行为相结合的。由于CP是通过第二种方法构建的,因此大多数基于数据的因果模型不适合工业自动化。为了弥合这一差距,提出了针对工业自动化各种应用程序领域的统一因果模型,这将允许更好地沟通和跨学科的更好的数据使用。最终的模型在数学上描述了CPS的行为,并且由于对应用领域的独特要求评估了该模型,因此证明统一的因果关系模型可以作为在工业自动化中应用新方法的基础,该方法侧重于机器学习。
translated by 谷歌翻译
我们研究了基于功能的新闻企业问题,其中决策者可以访问包括需求观察和外源特征组成的历史数据。在这种情况下,我们研究了功能选择,旨在得出具有改进样本外部性能的稀疏,可解释的模型。到目前为止,最新的方法利用正则化,这会惩罚所选特征的数量或解决方案向量的规范。作为替代方案,我们介绍了一种新型的双层编程公式。高级问题选择了一部分功能,这些功能将基于固定验证集的订购决策的样本外成本估算最小化。下层问题仅使用上层选择的功能,了解训练集中决策功能的最佳系数。我们为Bilevel程序提供了混合整数线性程序重新制定,可以通过标准优化求解器求解为最佳性。我们的计算实验表明,该方法准确地恢复了几百个观察结果的实例中的基础真相。相反,基于正则化的技术通常在功能恢复时失败,或者需要数千个观察值才能获得相似的准确性。关于样本外的概括,我们实现了改进或可比的成本绩效。
translated by 谷歌翻译
双簇算法分区数据并同时协变量,提供了几个领域的新见解,例如分析基因表达以发现新的生物学功能。本文使用能量距离(ED)和最大平均差异(MMD)的概念在抽象空间中开发了一种新的无模型双簇算法 - 能够处理复杂数据(例如曲线或图形)的概率分布之间的两个距离。所提出的方法比大多数现有文献方法都可以学习更多的通用和复杂的群集形状,这些方法通常着重于检测均值和方差差异。尽管我们的方法的两次簇配置受到限制,以在基准和协变量级别创建不相交结构,但结果是竞争性的。我们的结果与最佳场景中的最新方法相似,假设有适当的内核选择,当群集差异集中在高阶矩中时,它们的表现优于它们。该模型的性能已在涉及模拟和现实世界数据集的几种情况下进行了测试。最后,使用最佳运输理论的一些工具确定了新的理论一致性结果。
translated by 谷歌翻译
这项工作总结了2022年2022年国际生物识别联合会议(IJCB 2022)的IJCB被遮挡的面部识别竞赛(IJCB-OCFR-2022)。OCFR-2022从学术界吸引了总共3支参与的团队。最终,提交了六个有效的意见书,然后由组织者评估。在严重的面部阻塞面前,举行了竞争是为了应对面部识别的挑战。参与者可以自由使用任何培训数据,并且通过使用众所周知的数据集构成面部图像的部分来构建测试数据。提交的解决方案提出了创新,并以所考虑的基线表现出色。这项竞争的主要输出是具有挑战性,现实,多样化且公开可用的遮挡面部识别基准,并具有明确的评估协议。
translated by 谷歌翻译